Add GGML model #617

matthoffner · 2023-06-26T05:18:28Z

This PR adds initial ggml model support based on the llama-cpp-python server. This should support the majority of models, with the potential of using a library like ctransformers useful for ggml models that do no work with llama.cpp.

CLAassistant · 2023-06-26T05:18:33Z

All committers have signed the CLA.

lm_eval/models/__init__.py

lm_eval/models/ggml.py

haileyschoelkopf · 2023-06-27T14:00:45Z

Thanks so much for the contribution so far! Looks great so far. Left some comments which may hopefully be helpful, and also just noting that loglikelihood_rolling is still todo.

matthoffner · 2023-06-27T15:56:21Z

Thanks for the comments @haileyschoelkopf I'm looking into implementing them today. I will also look into loglikelihood_rolling.

Update: I've implemented a function using loglikelihood_rolling with a test.

haileyschoelkopf · 2023-07-04T14:36:19Z

Will try to review this tomorrow! Looks great, thanks so much!

From a quick skim, I think there are just a couple minor things extant re: ensuring things break more than just logging an error message, if there's no/malformed response from the server. I can probably go and add those if need be.

A few-line pointer to add to the library README on what GGML apis are supported + a pointer on (where to go in order to) spin up an inference server for models would be awesome if not too much work!

matthoffner · 2023-07-04T21:49:07Z

Thanks @haileyschoelkopf! I updated the README with the command I'm using locally. I'm pretty new to evaluating models so I may be missing some details still.

haileyschoelkopf · 2023-07-07T18:55:28Z

Thanks! LLaMA-CPP is surprisingly easy to install... do you have any recommendations for models + tasks to test with GGML?

Running the 8bit LLaMA from here: https://huggingface.co/TheBloke/LLaMa-7B-GGML gives me 0.2504 accuracy on Hellaswag, which is I think way worse than I'd hope from an 8-bit quantized LLaMA-7b.

matthoffner · 2023-07-07T19:34:41Z

Thanks @haileyschoelkopf I'm getting similar results.

Update: I took out the reordering code which seemed to prevent the script from running. Pushed changes if anyone else wants to try.

IgnacioFDM · 2023-07-09T02:47:21Z

Seems to be broken

I tried 85d3b8d and it fails to run

Running loglikelihood requests
  0%|                                                                                | 0/40168 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/idm/lm-evaluation-harness/main.py", line 93, in <module>
    main()
  File "/home/idm/lm-evaluation-harness/main.py", line 59, in main
    results = evaluator.simple_evaluate(
  File "/home/idm/lm-evaluation-harness/lm_eval/utils.py", line 243, in _wrapper
    return fn(*args, **kwargs)
  File "/home/idm/lm-evaluation-harness/lm_eval/evaluator.py", line 94, in simple_evaluate
    results = evaluate(
  File "/home/idm/lm-evaluation-harness/lm_eval/utils.py", line 243, in _wrapper
    return fn(*args, **kwargs)
  File "/home/idm/lm-evaluation-harness/lm_eval/evaluator.py", line 289, in evaluate
    resps = getattr(lm, reqtype)([req.args for req in reqs])
  File "/home/idm/lm-evaluation-harness/lm_eval/base.py", line 893, in fn
    rem_res = getattr(self.lm, attr)(remaining_reqs)
  File "/home/idm/lm-evaluation-harness/lm_eval/models/ggml.py", line 47, in loglikelihood
    response = self.ggml_completion(self.base_url, context=context, continuation=continuation)
TypeError: GGMLLM.ggml_completion() got multiple values for argument 'context'

I tried again with the previous commit 0fa2c7d and it runs but it returns a clearly incorrect result after doing a single completion (it finishes instantly instead of taking ~30min as you'd expect hellaswag, and returns the following results).

|  Task   |Version| Metric |Value |   |Stderr|
|---------|------:|--------|-----:|---|-----:|
|hellaswag|      0|acc     |0.2504|±  |0.0043|
|         |       |acc_norm|0.2409|±  |0.0043|

matthoffner · 2023-07-09T03:00:12Z

Thanks @IgnacioFDM I'll look into a regression there. The results I'm getting are even worse although I'm able to run it for longer iterations now.

Addressed

JettScythe · 2023-08-22T21:12:00Z

just tried this out and getting the same issue as @IgnacioFDM.
Any luck getting this going @matthoffner ?

ethanhs · 2023-08-25T22:25:28Z

It seems llama.cpp is moving to a new GGUF format, so it might be good to change the name to GGUFLM

StellaAthena · 2023-09-06T14:53:07Z

@matthoffner Can you post some evals run with the patched implementation, ideally with non-quantized #s for comparison?

matthoffner · 2023-09-12T00:02:22Z

@matthoffner Can you post some evals run with the patched implementation, ideally with non-quantized #s for comparison?

Will try to revisit this week, if anyone else can confirm that would be great too, thanks.

StellaAthena · 2023-10-14T19:12:13Z

@matthoffner any luck?

matthoffner · 2023-10-15T01:21:01Z

@StellaAthena No luck. It runs without crashing but the results don't seem accurate.

StellaAthena · 2023-10-20T05:06:14Z

@StellaAthena No luck. It runs without crashing but the results don't seem accurate.

Hmmm. That's unfortunate, though it could be that the quantization causes performance degradation.

Try manually inspecting a few of the results (esp. the incorrect ones) and then try running the same prompts through the library's native interface. Do you get the same logits when using the model through our library?

LorenzoMinto · 2023-10-31T16:30:58Z

hey @matthoffner opened a PR against your fork: matthoffner#1 changing how the loglikelihood for completions is computed, following more closely what is done for the gpt3 and huggingface models.

For llama-2-7b-hf.gguf on winogrande I'm getting something close to what reported in the paper (there might some source of imprecision left). Before I was getting ~0.5 accuracy.

|   Task   |Version|Metric|Value |   |Stderr|
|----------|------:|------|-----:|---|-----:|
|winogrande|      0|acc   |0.6977|±  |0.0129|

Return score from continuation logprobs

matthoffner · 2023-10-31T19:23:40Z

Thanks @LorenzoMinto I merged your PR

LorenzoMinto · 2023-11-01T13:15:20Z

This is the eval for arc-easy. Also slightly overestimated (in the paper they report 75.2), but quite close

{
  "results": {
    "arc_easy": {
      "acc": 0.7567340067340067,
      "acc_stderr": 0.008804009846865538,
      "acc_norm": 0.7373737373737373,
      "acc_norm_stderr": 0.009029861776763754
    }
  },
  "versions": {
    "arc_easy": 0
  },
  ...
}

StellaAthena · 2023-11-01T14:24:58Z

These look good enough to merge to me, unless we want to confirm with other models and more datasets first.

haileyschoelkopf

Thank you everyone for the contributions! The posted results look good enough to merge. I will test this locally myself tomorrow, also left a few nitpicks which I can incorporate prior to merge (in particular, we should avoid silently continuing when the model does not return logprobs as expected.)

lm_eval/models/ggml.py

StellaAthena · 2023-11-08T01:43:01Z

@haileyschoelkopf Let's update the README to reflect this too!

haileyschoelkopf · 2023-11-08T03:02:28Z

Will do! Already done for the refactor branch PR

StellaAthena · 2023-11-08T03:04:13Z

@haileyschoelkopf are you sure? It doesn't look like it.

haileyschoelkopf · 2023-11-08T03:06:41Z

In #967 (not yet merged) there’s the updated model support table with GGUF

StellaAthena · 2023-11-08T05:23:08Z

In #967 (not yet merged) there’s the updated model support table with GGUF

Sorry, I shouldn't try to read things at midnight.

matthoffner added 2 commits June 26, 2023 02:16

add llama model

932f9db

add test against llama-cpp-python server

8f992eb

matthoffner requested review from haileyschoelkopf and lintangsutawika as code owners June 26, 2023 05:18

matthoffner mentioned this pull request Jun 26, 2023

Support for ggml #417

Closed

ethanhs reviewed Jun 26, 2023

View reviewed changes

lm_eval/models/__init__.py Outdated Show resolved Hide resolved

matthoffner added 2 commits June 26, 2023 16:12

rename to LlamaCppLM

3ee4c2e

rename to ggml

ac9f4be

matthoffner changed the title ~~Draft: Add Llama model~~ Draft: Add GGML model Jun 26, 2023

add mock recorded file test

896bd5f

haileyschoelkopf previously requested changes Jun 27, 2023

View reviewed changes

lm_eval/models/ggml.py Outdated Show resolved Hide resolved

lm_eval/models/ggml.py Outdated Show resolved Hide resolved

lm_eval/models/ggml.py Outdated Show resolved Hide resolved

matthoffner added 2 commits June 27, 2023 18:32

updates from feedback

97d3108

add loglikelihood_rolling

b56dee4

matthoffner changed the title ~~Draft: Add GGML model~~ Add GGML model Jul 4, 2023

updates to verify end to end with example

0fa2c7d

matthoffner force-pushed the master branch from e44bb5c to 0fa2c7d Compare July 4, 2023 21:43

remove reordering to get cli running

85d3b8d

matthoffner marked this pull request as draft July 9, 2023 02:58

matthoffner marked this pull request as ready for review August 29, 2023 05:02

haileyschoelkopf mentioned this pull request Oct 9, 2023

Evaluation on local dataset #903

Closed

Return continuation logprobs

9b87640

Merge pull request #1 from LorenzoMinto/master

35bdecd

Return score from continuation logprobs

StellaAthena added this to the v0.3.0 milestone Nov 1, 2023

haileyschoelkopf approved these changes Nov 1, 2023

View reviewed changes

lm_eval/models/ggml.py Outdated Show resolved Hide resolved

lm_eval/models/ggml.py Outdated Show resolved Hide resolved

lm_eval/models/ggml.py Outdated Show resolved Hide resolved

haileyschoelkopf added 5 commits November 3, 2023 15:12

ggml -> gguf in readme

a2e9530

ggml -> gguf rename , __init__.py

32f2bcf

Update and rename ggml.py to gguf.py

2d433a4

Update and rename test_ggml.py to test_gguf.py

97bc978

cleanup and fix variable names for GGUF model

1fdc005

haileyschoelkopf approved these changes Nov 3, 2023

View reviewed changes

Update gguf.py

05c2991

haileyschoelkopf approved these changes Nov 3, 2023

View reviewed changes

haileyschoelkopf merged commit 008fc2a into EleutherAI:master Nov 3, 2023
2 checks passed

haileyschoelkopf mentioned this pull request Nov 3, 2023

[Refactor] Upstream ggml from big-refactor branch #967

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GGML model #617

Add GGML model #617

matthoffner commented Jun 26, 2023 •

edited

Loading

CLAassistant commented Jun 26, 2023 •

edited

Loading

haileyschoelkopf commented Jun 27, 2023

matthoffner commented Jun 27, 2023 •

edited

Loading

haileyschoelkopf commented Jul 4, 2023

matthoffner commented Jul 4, 2023

haileyschoelkopf commented Jul 7, 2023

matthoffner commented Jul 7, 2023 •

edited

Loading

IgnacioFDM commented Jul 9, 2023

matthoffner commented Jul 9, 2023

JettScythe commented Aug 22, 2023

ethanhs commented Aug 25, 2023

StellaAthena commented Sep 6, 2023

matthoffner commented Sep 12, 2023

StellaAthena commented Oct 14, 2023

matthoffner commented Oct 15, 2023

StellaAthena commented Oct 20, 2023

LorenzoMinto commented Oct 31, 2023 •

edited

Loading

matthoffner commented Oct 31, 2023

LorenzoMinto commented Nov 1, 2023 •

edited

Loading

StellaAthena commented Nov 1, 2023

haileyschoelkopf left a comment

StellaAthena commented Nov 8, 2023

haileyschoelkopf commented Nov 8, 2023

StellaAthena commented Nov 8, 2023

haileyschoelkopf commented Nov 8, 2023

StellaAthena commented Nov 8, 2023

Add GGML model #617

Add GGML model #617

Conversation

matthoffner commented Jun 26, 2023 • edited Loading

CLAassistant commented Jun 26, 2023 • edited Loading

haileyschoelkopf commented Jun 27, 2023

matthoffner commented Jun 27, 2023 • edited Loading

haileyschoelkopf commented Jul 4, 2023

matthoffner commented Jul 4, 2023

haileyschoelkopf commented Jul 7, 2023

matthoffner commented Jul 7, 2023 • edited Loading

IgnacioFDM commented Jul 9, 2023

matthoffner commented Jul 9, 2023

JettScythe commented Aug 22, 2023

ethanhs commented Aug 25, 2023

StellaAthena commented Sep 6, 2023

matthoffner commented Sep 12, 2023

StellaAthena commented Oct 14, 2023

matthoffner commented Oct 15, 2023

StellaAthena commented Oct 20, 2023

LorenzoMinto commented Oct 31, 2023 • edited Loading

matthoffner commented Oct 31, 2023

LorenzoMinto commented Nov 1, 2023 • edited Loading

StellaAthena commented Nov 1, 2023

haileyschoelkopf left a comment

Choose a reason for hiding this comment

StellaAthena commented Nov 8, 2023

haileyschoelkopf commented Nov 8, 2023

StellaAthena commented Nov 8, 2023

haileyschoelkopf commented Nov 8, 2023

StellaAthena commented Nov 8, 2023

matthoffner commented Jun 26, 2023 •

edited

Loading

CLAassistant commented Jun 26, 2023 •

edited

Loading

matthoffner commented Jun 27, 2023 •

edited

Loading

matthoffner commented Jul 7, 2023 •

edited

Loading

LorenzoMinto commented Oct 31, 2023 •

edited

Loading

LorenzoMinto commented Nov 1, 2023 •

edited

Loading